Efficient Density Clustering Method for Large Spatial Data Using HOBBit Rings

نویسندگان

  • Fei Pan
  • Baoying Wang
  • Yi Zhang
  • Dongmei Ren
  • Xin Hu
  • William Perrizo
چکیده

Data mining for spatial data has become increasingly important as more and more organizations are exposed to spatial data from such sources as remote sensing, geographical information systems (GIS), astronomy, computer cartography, environmental assessment and planning, bioinformatics, etc. Recently, density based clustering methods, such as DENCLUE, DBSCAN, OPTICS, have been published and recognized as powerful clustering methods for Data Mining. These approaches have run time complexity of ) log ( n n O when using spatial index techniques, R tree and grid cell. However, these methods are known to lack scalability with respect to dimensionality. In this paper, we develop a new efficient density based clustering algorithm using HOBBit metrics and P-trees. The fast P-tree ANDing operation facilitates the calculation of the density function within HOBBit rings. The average run time complexity of our algorithm for spatial data in d-dimension is ) ( n dn O . Our proposed method has comparable cardinality scalability with other density methods for small and medium size of data, but superior dimensional scalability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A density based clustering approach to distinguish between web robot and human requests to a web server

Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Evaluation of Updating Methods in Building Blocks Dataset

With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...

متن کامل

Clustering Algorithm for 2D Multi-Density Large Dataset Using Adaptive Grids

Clustering is a key data mining problem. Densitybased clustering algorithms have recently gained popularity in the data mining field. Density and grid based technique is a popular way to mine clusters in a large spatial datasets wherein clusters are regarded as dense regions than their surroundings. The attribute values and ranges of these attributes characterize the clusters In this paper we a...

متن کامل

تجمع بیماری در مقیاسی وسیع و کاربرد آن در مطالعات اپیدمیولوژی و بهداشت

Spatial autocorrelation statistics provide summary information about the spatial arrangement of data in a map. In fact, these statistics compare neighboring area values in order to assess the level of large scale clustering. Whenever a large number of neighboring areas have either relatively large or relatively small values, large scale clustering may be detected. Detecting such clustering is a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003